BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13#21957
Conversation
| def test_count_with_only_nans_in_first_group(self): | ||
| # GH21956 | ||
| df = DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C': [1, 2]}) | ||
| result = df.groupby(['A', 'B']).C.count() |
There was a problem hiding this comment.
can u do an assert_produces_warning()
here (should not show a warning)
There was a problem hiding this comment.
This does not produce any warnings for me.
What should look for in travis? Grep "SeriesGroupBy.count"?
There was a problem hiding this comment.
just look at the warnings in the 3.6 log
There was a problem hiding this comment.
Ok, will look in the morning.
There was a problem hiding this comment.
ok, can you assert that this produces no warning (e.g. it IS producing a warning now in numpy < 1.13)
doc/source/whatsnew/v0.24.0.txt
Outdated
| - Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` with ``as_index=False`` leading to the loss of timezone information (:issue:`15884`) | ||
| - Bug in :meth:`DatetimeIndex.resample` when downsampling across a DST boundary (:issue:`8531`) | ||
| - | ||
| - Bug in :func:`pandas.core.groupby.GroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
a note is not necessary this is not user visible
There was a problem hiding this comment.
The bug shown in #21956 is user visible...
pandas/core/groupby/generic.py
Outdated
| mask = (ids != -1) & ~isna(val) | ||
| ids = ensure_platform_int(ids) | ||
| out = np.bincount(ids[mask], minlength=ngroups or 0) | ||
| minlength = ngroups or 0 |
There was a problem hiding this comment.
Remove the " or 0" part?
There was a problem hiding this comment.
I've made this a one-liner in the newest update.
2d4ab97 to
d65615b
Compare
Codecov Report
@@ Coverage Diff @@
## master #21957 +/- ##
=========================================
Coverage ? 92.02%
=========================================
Files ? 170
Lines ? 50710
Branches ? 0
=========================================
Hits ? 46664
Misses ? 4046
Partials ? 0
Continue to review full report at Codecov.
|
91343df to
97ead75
Compare
pandas/core/groupby/generic.py
Outdated
| mask = (ids != -1) & ~isna(val) | ||
| ids = ensure_platform_int(ids) | ||
| out = np.bincount(ids[mask], minlength=ngroups or 0) | ||
| minlength = ngroups or (None if _np_version_under1p13 else 0) |
There was a problem hiding this comment.
what I meant is can you just pass None always
There was a problem hiding this comment.
I think you are right, th doc string says "minlength : int, optional" and it accepts None for numpy >1.13. I'll change it.
doc/source/whatsnew/v0.23.4.txt
Outdated
| **Groupby/Resample/Rolling** | ||
|
|
||
| - Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) | ||
| - Bug in :func:`pandas.core.groupby.SeriesGroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
is there anything user visible atm?
There was a problem hiding this comment.
Yes, as the per the example in #21956:
>>> df = pd.DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C':[1,2]})
>>> df.groupby(['A', 'B']).C.count()
ValueError: minlength must be positive # when numpy <1.132b202e0 to
b250bce
Compare
doc/source/whatsnew/v0.23.4.txt
Outdated
|
|
||
| - Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) | ||
| - Bug in ``roll_quantile`` caused a memory leak when calling ``.rolling(...).quantile(q)`` with ``q`` in (0,1) (:issue:`21965`) | ||
| - Bug in :func:`pandas.core.groupby.SeriesGroupBy.count` when using numpy < 1.13 and ngroups=0 (:issue:`21956`). |
There was a problem hiding this comment.
again, is this actually a user facing change? (not against the note), just this is not understandable to user.
There was a problem hiding this comment.
Ok, new explanation uploaded.
b250bce to
eea3c76
Compare
| def test_count_with_only_nans_in_first_group(self): | ||
| # GH21956 | ||
| df = DataFrame({'A': [np.nan, np.nan], 'B': ['a', 'b'], 'C': [1, 2]}) | ||
| result = df.groupby(['A', 'B']).C.count() |
There was a problem hiding this comment.
ok, can you assert that this produces no warning (e.g. it IS producing a warning now in numpy < 1.13)
|
@jreback I got a straight up ValueError in master, so no warning. That ValueError is not in this PR, obviously I've got a feeling I don't understand you. Could you give an example where a warning is emitted? |
eea3c76 to
4e59555
Compare
4e59555 to
5c978e2
Compare
| @@ -32,6 +32,8 @@ Bug Fixes | |||
|
|
|||
| - Bug where calling :func:`DataFrameGroupBy.agg` with a list of functions including ``ohlc`` as the non-initial element would raise a ``ValueError`` (:issue:`21716`) | |||
There was a problem hiding this comment.
move this to 0.24.0, as this wont' backport cleanly.
5c978e2 to
49f30c2
Compare
|
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 26, 2018 at 12:58 Hours UTC |
49f30c2 to
17ad763
Compare
17ad763 to
32a5fcf
Compare
|
Something's wrong with travis, else green: |
|
thanks @topper-123 |
* master: BENCH: asv csv reading benchmarks no longer read StringIO objects off the end (pandas-dev#21807) BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 (pandas-dev#21224) BUG: bug in GroupBy.count where arg minlength passed to np.bincount must be None for np<1.13 (pandas-dev#21957) CLN: Vbench to asv conversion script (pandas-dev#22089) consistent docstring (pandas-dev#22066) TST: skip pytables test with not-updated pytables conda package (pandas-dev#22099) CLN: Remove Legacy MultiIndex Index Compatibility (pandas-dev#21740) DOC: Reword doc for filepath_or_buffer in read_csv (pandas-dev#22058) BUG: rolling with MSVC 2017 build (pandas-dev#21813)
|
so these are the warnings: https://travis-ci.org/pandas-dev/pandas/jobs/410445444 can you PR to fix this. I guess we have to switch on the version and actually pass 0 rather than None for > 113 |
…ust be None for np<1.13 (pandas-dev#21957)
git diff upstream/master -u -- "*.py" | flake8 --diffSee #21956 for details.